Goto

Collaborating Authors

 ranking model


Towards Global Optimal Visual In-Context Learning Prompt Selection

Neural Information Processing Systems

Visual In-Context Learning (VICL) is a prevailing way to transfer visual foundation models to new tasks by leveraging contextual information contained in in-context examples to enhance learning and prediction of query samples.




OnAMallows-typeModelFor(Ranked) Choices

Neural Information Processing Systems

We consider a preference learning setting where every participant chooses an ordered listofkmost preferred items among adisplayed setofcandidates.


On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation

Neural Information Processing Systems

We investigate the relationship between three fundamental problems in machine learning: binary classification, bipartite ranking, and binary class probability estimation (CPE). It is known that a good binary CPE model can be used to obtain a good binary classification model (by thresholding at 0.5), and also to obtain a good bipartite ranking model (by using the CPE model directly as a ranking model); it is also known that a binary classification model does not necessarily yield a CPE model. However, not much is known about other directions. Formally, these relationships involve regret transfer bounds. In this paper, we introduce the notion of weak regret transfer bounds, where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution (and in practice, must be estimated from data). We then show that, in this weaker sense, a good bipartite ranking model can be used to construct a good classification model (by thresholding at a suitable point), and more surprisingly, also to construct a good binary CPE model (by calibrating the scores of the ranking model).


Learning Rich Rankings

Neural Information Processing Systems

Although the foundations of ranking are well established, the ranking literature has primarily been focused on simple, unimodal models, e.g. the Mallows and Plackett-Luce models, that define distributions centered around a single total ordering. Explicit mixture models have provided some tools for modelling multimodal ranking data, though learning such models from data is often difficult. In this work, we contribute a contextual repeated selection (CRS) model that leverages recent advances in choice modeling to bring a natural multimodality and richness to the rankings space. We provide rigorous theoretical guarantees for maximum likelihood estimation under the model through structure-dependent tail risk and expected risk bounds. As a by-product, we also furnish the first tight bounds on the expected risk of maximum likelihood estimators for the multinomial logit (MNL) choice model and the Plackett-Luce (PL) ranking model, as well as the first tail risk bound on the PL ranking model. The CRS model significantly outperforms existing methods for modeling real world ranking data in a variety of settings, from racing to rank choice voting.


On A Mallows-type Model For (Ranked) Choices

Neural Information Processing Systems

We consider a preference learning setting where every participant chooses an ordered list of $k$ most preferred items among a displayed set of candidates.


Hierarchical Ranking Neural Network for Long Document Readability Assessment

Zheng, Yurui, Chen, Yijun, Zhang, Shaohong

arXiv.org Artificial Intelligence

Readability assessment aims to evaluate the reading di ffi culty of a text. In recent years, while deep learning technol - ogy has been gradually applied to readability assessment, m ost approaches fail to consider either the length of the text or the ordinal relationship of readability labels. This pap er proposes a bidirectional readability assessment mechan ism that captures contextual information to identify regions w ith rich semantic information in the text, thereby predicti ng the readability level of individual sentences. These sente nce-level labels are then used to assist in predicting the ov erall readability level of the document. Additionally, a pairwis e sorting algorithm is introduced to model the ordinal relationship between readability levels through label subtrac tion. Experimental results on Chinese and English datasets demonstrate that the proposed model achieves competitive p erformance and outperforms other baseline models. Introduction Automatic Text Readability (ARA) research originated in th e early 20th century, aiming to evaluate text reading di ffi culty and assist educators in recommending appropriate rea ding materials for learners [ 1 ]. Readability assessment approaches are generally classified into three paradig ms: human evaluation, co-selection-based analysis, and content-based analysis. Human evaluation involves expert annotation or reader surveys; co-selection methods leverage user interaction data such as reading time or choices [ 2 ]; and content-based approaches infer readability using linguistic, syntactic, or semantic features extracted fro m the text itself. Early studies predominantly relied on experts' subjective evaluations and simple statistical feat ures, such as sentence length and word complexity.


Causal Synthetic Data Generation in Recruitment

Iommi, Andrea, Mastropietro, Antonio, Guidotti, Riccardo, Monreale, Anna, Ruggieri, Salvatore

arXiv.org Artificial Intelligence

The importance of Synthetic Data Generation (SDG) has increased significantly in domains where data quality is poor or access is limited due to privacy and regulatory constraints. One such domain is recruitment, where publicly available datasets are scarce due to the sensitive nature of information typically found in curricula vitae, such as gender, disability status, or age. This lack of accessible, representative data presents a significant obstacle to the development of fair and transparent machine learning models, particularly ranking algorithms that require large volumes of data to effectively learn how to recommend candidates. In the absence of such data, these models are prone to poor generalisation and may fail to perform reliably in real-world scenarios. Recent advances in Causal Generative Models (CGMs) offer a promising solution. CGMs enable the generation of synthetic datasets that preserve the underlying causal relationships within the data, providing greater control over fairness and interpretability in the data generation process. In this study, we present a specialised SDG method involving two CGMs: one modelling job offers and the other modelling curricula. Each model is structured according to a causal graph informed by domain expertise. We use these models to generate synthetic datasets and evaluate the fairness of candidate rankings under controlled scenarios that introduce specific biases.